Artificial intelligence (AI) can potentially enhance cognitive assessment practices in maternal and child health nursing education. Objectives: To evaluate the reliability, accuracy and precision, and external validity of an AI-assisted answer assessment (4A) program for cognitive assessments in nursing education. Methods: This study is a validation study. Initially, 170 nursing students from northern Thailand participated, with 52 randomly selected for detailed testing. Agreement testing between the 4A program and human experts was conducted using the intraclass correlation coefficient (ICC). Accuracy and precision testing compared 4A scores with human expert assessments via the McNemar test. External validation involved 138 participants to compare the 4A program’s assessments against national examination outcomes using logistic regression. Results: Results indicated a high level of consistency between the 4A program and human experts (ICC = 0.886). With an accuracy of 0.808 and a precision of 0.913, compared to the human expert’s accuracy of 0.923 and precision of 1.000. The McNemar test (χ2 = 0.4, p = 0.527) showed no significant difference in evaluation performance between AI and human experts. Higher scores on the 4A program significantly predicted success in the national nursing examination (OR: 1.124, p = 0.031). Conclusions: The 4A program demonstrates potential in reliably assessing nursing students’ cognitive abilities and predicting exam success. This study advocates for the continued integration of AI in educational assessments and the importance of refining AI systems to better align with traditional assessment methods.
Loading....